AITopics | symbolic dynamic programming

Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Neural Information Processing SystemsApr-6-2023, 12:16:20 GMT

Partially-observable Markov decision processes (POMDPs) provide a powerful model for real-world sequential decision-making problems. In recent years, point- based value iteration methods have proven to be extremely effective techniques for finding (approximately) optimal dynamic programming solutions to POMDPs when an initial set of belief states is known. However, no point-based work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of possible observations, there are only a finite number of observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is known. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic pro- gramming solutions for continuous state MDPs can be generalized to continu- ous state POMDPs with discrete observations, and (2) we show how this solution can be further extended via recently developed symbolic methods to continuous state and observations to derive the minimal relevant observation partitioning for potentially correlated, multivariate observation spaces.

continuous state and observation pomdp, observation space, symbolic dynamic programming, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Zamani, Zahra, Sanner, Scott, Poupart, Pascal, Kersting, Kristian

Neural Information Processing SystemsFeb-14-2020, 22:44:02 GMT

Partially-observable Markov decision processes (POMDPs) provide a powerful model for real-world sequential decision-making problems. In recent years, point- based value iteration methods have proven to be extremely effective techniques for finding (approximately) optimal dynamic programming solutions to POMDPs when an initial set of belief states is known. However, no point-based work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of possible observations, there are only a finite number of observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is known. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic pro- gramming solutions for continuous state MDPs can be generalized to continu- ous state POMDPs with discrete observations, and (2) we show how this solution can be further extended via recently developed symbolic methods to continuous state and observations to derive the minimal relevant observation partitioning for potentially correlated, multivariate observation spaces. We demonstrate proof-of- concept results on uni- and multi-variate state and observation steam plant control.

continuous state and observation pomdp, observation space, symbolic dynamic programming, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Analytic Decision Analysis via Symbolic Dynamic Programming for Parameterized Hybrid MDPs

Kinathil, Shamin (Australian National University and Data61, CSIRO) | Soh, Harold (University of Toronto) | Sanner, Scott (University of Toronto)

AAAI ConferencesJun-14-2017

For example, we may need to (i) perform inverse learning of the cost parameters of a multi-objective reward based on observed agent behavior; (ii) perform sensitivity analyses of policies to various parameter settings; or (iii) analyze and optimize policy performance as a function of policy parameters. When such problems have mixed discrete and continuous state and/or action spaces, this leads to parameterized hybrid MDPs (PHMDPs) that are often approximately solved via discretization, sampling, and/or local gradient methods (when optimization is involved). In this paper we combine two recent advances that allow for the first exact solution and optimization of PHMDPs. We first show how each of the aforementioned use cases can be formalized as PHMDPs, which can then be solved via an extension of symbolic dynamic programming (SDP) even when the solution is piecewise nonlinear. Secondly, we can leverage recent advances in non-convex solvers that require symbolic forms of the objective function for non-convex global optimization in (i), (ii), and (iii) using SDP to derive symbolic solutions for each PHMDP formalization. We demonstrate the efficacy and scalability of our optimal analytical framework on nonlinear examples of each of the aforementioned use cases.

markov decision process, nullx, proceedings, (13 more...)

AAAI Conferences

Twenty-Seventh International Conference on Automated Planning and Scheduling

Country:

North America > Canada > Ontario > Toronto (0.15)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(4 more...)

Genre: Overview (0.46)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Nonlinear Optimization and Symbolic Dynamic Programming for Parameterized Hybrid Markov Decision Processes

Kinathil, Shamin (Australian National University and Data61, CSIRO) | Soh, Harold (University of Toronto) | Sanner, Scott (University of Toronto)

AAAI ConferencesFeb-4-2017

It is often critical in real-world applications to: (i) perform inverse learning of the cost parameters of a multi-objective reward based on observed agent behavior; (ii) perform sensitivity analyses of policies to various parameter settings; and (iii) analyze and optimize policy performance as a function of policy parameters. When such problems have mixed discrete and continuous state and/or action spaces, this leads to parameterized hybrid MDPs (PHMDPs) that are often approximately solved via discretization, sampling, and/or local gradient methods (when optimization is involved). In this paper we combine two recent advances that allow for the first exact solution and optimization of PHMDPs. We first show how each of the aforementioned use cases can be formalized as PHMDPs, which can then be solved via an extension of symbolic dynamic programming (SDP) even when the solution is piecewise nonlinear. Secondly, we leverage recent advances in non-convex solvers such as dReal and dOp (that offer δ-optimality guarantees for nonlinear problems given a symbolic function) for non-convex global optimization in (i), (ii), and (iii) using SDP to derive symbolic solutions to each PHMDP formalization. We demonstrate the efficacy and scalability of our framework by calculating the first known exact solutions to complex nonlinear examples of each of the aforementioned use cases.

dynamic programming, optimization, sensitivity analysis, (13 more...)

AAAI Conferences

Workshops at the Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America > Canada > Ontario > Toronto (0.47)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(5 more...)

Genre: Overview (0.46)

Industry: Health & Medicine > Therapeutic Area (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.53)

Add feedback

Real-Time Symbolic Dynamic Programming

Vianna, Luis Gustavo Rocha (University of São Paulo) | Barros, Leliane N. de (University of São Paulo) | Sanner, Scott (NICTA and Australian National University)

AAAI ConferencesMar-6-2015

Recent advances in Symbolic Dynamic Programming (SDP) combined withthe extended algebraic decision diagram (XADD) have provided exactsolutions for expressive subclasses of finite-horizon Hybrid MarkovDecision Processes (HMDPs) with mixed continuous and discrete stateand action parameters. Unfortunately, SDP suffers from two majordrawbacks: (1) it solves for all states and can be intractable formany problems that inherently have large optimal XADD value functionrepresentations; and (2) it cannot maintain compact (pruned) XADDrepresentations for domains with nonlinear dynamics and reward due tothe need for nonlinear constraint checking. In this work, wesimultaneously address both of these problems by introducing real-timeSDP (RTSDP). RTSDP addresses (1) by focusing the solution and valuerepresentation only on regions reachable from a set of initial statesand RTSDP addresses (2) by using visited states as witnesses ofreachable regions to assist in pruning irrelevant or unreachable(nonlinear) regions of the value function. To this end, RTSDP enjoysprovable convergence over the set of initial states and substantialspace and time savings over SDP as we demonstrate in a variety of hybrid domains ranging from inventory to reservoir to traffic control.

artificial intelligence, dynamic programming, machine learning, (18 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

South America > Brazil > São Paulo (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Africa > Togo (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Zamani, Zahra, Sanner, Scott, Poupart, Pascal, Kersting, Kristian

Neural Information Processing SystemsDec-31-2012

Partially-observable Markov decision processes (POMDPs) provide a powerful model for real-world sequential decision-making problems. In recent years, point- based value iteration methods have proven to be extremely effective techniques for ﬁnding (approximately) optimal dynamic programming solutions to POMDPs when an initial set of belief states is known. However, no point-based work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an inﬁnite number of possible observations, there are only a ﬁnite number of observation partitionings that are relevant for optimal decision-making when a ﬁnite, ﬁxed set of reachable belief states is known. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic pro- gramming solutions for continuous state MDPs can be generalized to continu- ous state POMDPs with discrete observations, and (2) we show how this solution can be further extended via recently developed symbolic methods to continuous state and observations to derive the minimal relevant observation partitioning for potentially correlated, multivariate observation spaces. We demonstrate proof-of- concept results on uni- and multi-variate state and observation steam plant control.

artificial intelligence, continuous observation, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
Oceania > Australia (0.28)
North America > Canada (0.28)

Industry:

Energy > Power Industry (0.36)
Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Symbolic Dynamic Programming for Continuous State and Action MDPs

Zamani, Zahra (ANU - NICTA The Australian National University National ICT of Australia) | Sanner, Scott (NICTA and ANU) | Fang, Cheng (Department of Aeronautics and Astronautics, MIT)

AAAI ConferencesJul-21-2012

Many real-world decision-theoretic planning problemsare naturally modeled using both continuous state andaction (CSA) spaces, yet little work has provided ex-act solutions for the case of continuous actions. Inthis work, we propose a symbolic dynamic program-ming (SDP) solution to obtain the optimal closed-formvalue function and policy for CSA-MDPs with mul-tivariate continuous state and actions, discrete noise,piecewise linear dynamics, and piecewise linear (or re-stricted piecewise quadratic) reward. Our key contribu-tion over previous SDP work is to show how the contin-uous action maximization step in the dynamic program-ming backup can be evaluated optimally and symboli-cally — a task which amounts to symbolic constrainedoptimization subject to unknown state parameters; wefurther integrate this technique to work with an efﬁcientand compact data structure for SDP — the extendedalgebraic decision diagram (XADD). We demonstrateempirical results on a didactic nonlinear planning exam-ple and two domains from operations research to showthe ﬁrst automated exact solution to these problems.

artificial intelligence, machine learning, opération, (19 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Africa > Togo (0.04)
(2 more...)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Symbolic Dynamic Programming for Discrete and Continuous State MDPs

Sanner, Scott, Delgado, Karina Valdivia, de Barros, Leliane Nunes

arXiv.org Artificial IntelligenceFeb-14-2012

Many real-world decision-theoretic planning problems can be naturally modeled with discrete and continuous state Markov decision processes (DC-MDPs). While previous work has addressed automated decision-theoretic planning for DCMDPs, optimal solutions have only been defined so far for limited settings, e.g., DC-MDPs having hyper-rectangular piecewise linear value functions. In this work, we extend symbolic dynamic programming (SDP) techniques to provide optimal solutions for a vastly expanded class of DCMDPs. To address the inherent combinatorial aspects of SDP, we introduce the XADD - a continuous variable extension of the algebraic decision diagram (ADD) - that maintains compact representations of the exact value function. Empirically, we demonstrate an implementation of SDP with XADDs on various DC-MDPs, showing the first optimal automated solutions to DCMDPs with linear and nonlinear piecewise partitioned value functions and showing the advantages of constraint-based pruning for XADDs.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Artificial Intelligence

1202.3762

Country: